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Abstract 

A novel coding technique is presented for signal prediction with applications in- 
cluding speech coding, system identification, and estimation of input excitation. The 
approach is based on the blind equalization method for speech signal processing in con- 
junction with the geometric subspace projection theory to formulate the basic prediction 
equation. The speech-coding problem is often divided into two parts, a linear prediction 
model and excitation input. The parameter coefficients of the linear predictor and the 
input excitation are solved simultaneously and recursively by a conventional recursive 
least-squares algorithm. The excitation input is computed by coding all possible out- 
comes into a binary codebook. The coefficients of the linear predictor and excitation, 
and the index of the codebook can then be used to represent the signal. In addition, 
a variable-frame concept is proposed to block the same excitation signal in sequence in 
order to reduce the storage size and increase the transmission rate. The results of this 
work can be easily extended to the problem of disturbance identification. The basic 
principles are outlined in this report and differences from other existing methods are 
discussed. Simulations are included to demonstrate the proposed method. 


1 INTRODUCTION 

In the past decade, a number of advanced technologies have been employed to represent 
speech signals digitally for use in communication-related operations such as audio transmis- 
sion, storage, manipulation, speech recognition, and even speech synthesis. These operations 
can be performed more efficiently by reducing the amount of information needed to repre- 
sent a given speech signal. The term “speech coding” , or simply “coding” is thus introduced 
in speech processing. In speech coding, a major objective is to represent the digital signal 
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with as few bits as possible, i.e., to compress the signal. The degree of compression depends 
on the cost of transmission or storage, the cost of coding the digital speech signal, and 
speech quality requirements. Before 1980, the high cost of coding and low speech quality 
made the speech coding impractical. However, with the improved digital signal processing 
hardware capability and significant progress in speech coding research, speech coding is now 
widely used in a variety of applications. Speech coding techniques proposed and developed 
over the past decade can be divided into two general categories; waveform coders and voice 
coders (vocoders) [1]. 

In most of the waveform coding techniques, the samples are processed by the scalar 
quantization. A scalar quantizer operates on a single sample at a time and represents each 
sample by a sequence of levels through a mapping function. The output of the quantizer, 
namely the quantized signal, can hence be coded by binary digits. On the other hand, a 
block of samples may be quantized as a single entity through a mapping function, which is 
called vector quantization. 

In contrast with the waveform coding, voice coding divides the speech problem into two 
parts; part one creates an analytical model of the vocal tract, and part two synthesize an 
analytical representation of the input excitation. The true input is never measured but 
the idea is to reconstruct the recorded signal by convolving the analytical model with a 
synthesized input. Typically, the analytical model structure is assumed to have all poles 
and the synthesized input is assumed to be a periodic impulse train with period equal to the 
fundamental frequency. For unvoiced speech, the excitation is a white noise sequence [1, 2], 

Linear Predictive Coding (LPC or LP) is a voice coding approach widely used in practice 
today. The objectives of LP analysis are to estimate the coefficients of an all-pole model 
representing the vocal tract, to determine analytically the type of excitation, and to estimate 
the fundamental frequency, and its gain coefficients. Different LPC-type speech analysis and 
synthesis schemes differ primarily in the type of input signal which is generated for speech 
synthesis. Several schemes have been proposed for generating the input signal; residual 
excited linear prediction (RELP) vocoder, multipulse LPC vocoder, code-excited linear 
prediction (CELP) [3] and vector sum excited linear prediction (VSELP) [4], 

The advanced speech coders since the 1990s are based on the LPC scheme using vector 
quantization (VQ). In the LPC-type vocoder, the bulk of the transmission rate is used 
to transfer the synthesized excitation sequence. Therefore, how to synthesize excitation 
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efficiently and effectively becomes very important. In [5], the vector quantization introduced 
includes codebook coding, tree coding, and trellis coding. However, as most of the coders 
use codebook coding, this method is of particular interest. In codebook coding, the set 
of possible output sequences is arranged in a codebook whose elements are not restricted 
in any way. When the optimum output sequence is searched, the corresponding index of 
that sequence is transmitted. In fact, codebook coding is impractical when the size of the 
codebook is large. Some effort for searching the optimum sequence has been done [6], for 
example the binary tree search. The proper design of the codebook is the key to a success 
for LPC-type speech coding. 

The conventional blind equalization is aimed at recovering the input signal applied to 
a linear time-invariant system from the observed signal developed at the output of the 
system [7]. In other words, blind equalization is a special kind of adaptive inverse filtering 
that operates without access to the source of the input signal. In digital communications, 
the input signal is commonly called the transmitted signal. The time-invariant system is 
referred to as the channel. There are two general approaches developed to achieve this 
task; the Constant Modulus Algorithm (CMA) [7, 8] and Decision Directed (DD) [9, 10] 
equalizer. The main idea is to keep the output of the equalizer at constant modulus (absolute 
value) [7, 8, 11]. The input signal will have some known property, which helps determine 
how the observed signal has been corrupted. In [12, 13], the blind adaptive prediction 
exploited the constant modulus property to keep the prediction error at each estimate 
within predefined bounds. The use of constant modulus is to modulate the prediction error 
to a constant value. Once the prediction error is modulated to a sufficiently small value, the 
prediction derived from blind equalization becomes reliable. Applying the idea of constant 
modulus to speech coding enables the excitation quantized at the same time while the LPC 
coefficients are updated. Note that the equalizer output is the recovered input signal that 
is similar to the LPC- based quantized input. However, this leads an open question of how 
small the given modulus needs to be selected [14], Furthermore, it will be shown in this 
paper that the blind adaptive prediction, which only contains a unit modulus, is unable to 
obtain the steady LPC solution for speech coding. A natural alternative is to expand then 
single modulus to become multiple modulus. As a result, the adaptive multi-modulus blind 
predictor is proposed in this paper. 

In [15], the multiple modulus concept was proposed to deal with blind equalization of 
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signal, such as Multilevel Quadrature- Amplitude Modulation (M-ary QAM). The major 
difference is that, contrary to the approaches presented in [15, 16], our proposed adap- 
tive multi-modulus blind predictor does not specify a priori the modulus. Moreover, our 
approach combines recursive least-squares with DD approaches to determine the modulus 
recursively [17]. In contrast, the first proposed MMA (multiple modulus algorithm) [15], 
uses a straightforward generalization of the CMA cost function to derive its update and the 
second one, DAMA (decision adjusted modulus algorithm), is a hybrid of the CMA and the 
DD approaches. 

In this paper, we propose a novel coding technique for speech compression. The tech- 
nique does not require separate solutions for the equalizer coefficients and input quantiza- 
tion. The equalizer coefficients in this paper are nothing but the coefficients of an all-pole 
model. The approach is to integrate the input identification into the adaptive estimation of 
the equalizer coefficients. The goal is to make the proposed technique feasible for real-time 
implementation in practice. The estimation of equalizer coefficients and the input identifi- 
cation are obtained recursively by the coefficient smoothing technique [17]. The input signal 
is generated without using a separate quantization scheme when the predictor is updated. 
The input code book is derived analytically instead of generating it based on the stochastic 
assumptions [18]. After the entire process completed, the parameters to be quantized before 
transmission or stored are the coefficients, the gains of the input, and the index of the input 
sequence. The geometry space concepts lead to an intensive and complete explanation of 
the proposed technique. 

Regarding the aim of low bit rate coding, the conventional coders usually deal with the 
speech by frames of samples. However, it is likely that this may not be the best way to 
describe the non-stationary behavior of the sound sources. On the other hand, the precise 
coding based on sample-by-sample can always produce high quality with negligible coding 
distortion and negligible coding delay. The blind adaptive prediction is originally proposed 
to perform the prediction on the sample- by-sample to overcome the problems that the 
conventional LP model suffers from the stationary assumptions involved. As a result, the 
approach that proposed in this paper includes a variable frame concept for transmission 
and storage. The sample-by-sample scheme is used to do precise coding and obtain the 
high quality. The same excitation signal is then blocked into frames to be transmitted or 
stored. Hence, the resulting bit rate can be reduced. 
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2 ALGORITHM 


This section begins with a brief description of conventional linear predictive coding and 
blind equalization. The linear predictive techniques were developed mainly for speech cod- 
ing, whereas the blind equalization techniques were derived for input identification, i.e. , 
transmitted signal recovery for digital communications. 

2.1 Linear Prediction (LP) 

Linear prediction techniques were first used for speech analysis and synthesis by Itakura and 
Saito [19], and Atal and Schroeder [20], which foster further work in coding, recognition, 
enhancement and so on [2, 21, 22, 23]. A general flowchart of the LP modeling is shown in 
figure 1, and the predicted value is defined by 
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Figure 1: LPC modeling diagram 


x(k) + 9ix(k — i) = u{k) (1) 

i=i 

where x(k) is the synthesized speech signal and u(k) is its quantized input signal at time 
k. Equation (1) shows that the synthesized signal x(k) is a linear function of the current 
input signal u(k) and its past signal x(k — i ) weighted by the tap constant value B{ for 
% = 1, 2, . . . , n where n is an integer greater than zero. In signal processing, Eq. (1) is called 
the closed-loop formulation for computing the synthesized signal x(k). The tap weights 
Bi are commonly called the LP coefficients that constitutes an all-pole model. The LP 
coefficients Oi for i = 1,2, ... ,n and the quantized input signal u(k) for k = 1, 2, . . . , £ with 
t being the length of data length may be obtained by minimizing the squared error between 
real and synthesized speech signals, which is defined by 

J = E{\ x(k) -x(k) | 2 } (2) 
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Figure 2: Blind Equalization 


where E{-} denotes the expectation operator. Accordingly, a number of methods such as 
the autocorrelation method, the covariance method, the lattice method, and so forth [24] 
were so developed as the formulations for easrching LP solutions. 

2.2 Blind Equalization 

Blind Equalization in digital communication is a deconvolution process to recover the input 
signal applied to a time-invariant system. It is a special kind of adaptive inverse filtering 
that operate without access the transmitted signal (i.e. , input signal). The name “blind 
equalization” refers to the ability of an adaptive algorithm to perform deconvolution in a 
blindfolded or self-recovered fashion. Figure 2 shows a general flowchart where the channel 
includes the combined effects of a transmit filter, a transmission medium, and a receiver 
filter that may be represented by a linear time-invariant system or a linear predictor. The 
objective is to estimate the coefficients of the linear predictor and the input signal, given 
the observed signal x(k) for k = 1.2 The input signal is generally in the form of 
binary sequence. 

In [12, 13], a procedure was proposed for estimation of the LP coefficients. It was aimed 
at overcoming the difficulties inherent in the non-stationarity of the signals to be modeled. 
The approach is to keep the prediction error of representation at each estimate to within 
a predefined set of bounds rather than minimize the mean squared error in Eq. (2). The 
output of the equalizer (i.e., the recovered input signal) can be obtained by 

n 

u(k ) = ^2 9i(k)x(k — i ) (3) 

i = o 

where 9i(k) at each time step k for i = 0, . . . , n are coefficients to be determined. Note 
that Eq. (3) uses the observed quantity x rather than the synthesized signal x in Eq. (1). 
Equation (3) is commonly called the open-loop formulation in signal processing whereas 
Eq. (1) is referred to as the closed-loop formulation. 
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In 1980, Godard proposed a family of constant modulus blind equalization algorithms 
for use in two-dimensional digital communication systems [7]. Among them, the so called 
CMA (constant modulus algorithm) is derived from the cost function as 

Jcma -- E{(u(k) 2 - l) 2 } (4) 

Once the equalization is achieved, the output of the equalizer is modulated to approach to 
±1. The LP model from Eq. (3) can be modified by normalizing the coefficients 6i(k) at 
each time step k for i = 0, . . . , n such that 9o(k) = 1 to yield 

n 

u(k) = x(k) + ^2 9i(k)x(k — i) (5) 

i= 1 

The coefficient normalization is equivalent to scaling the observed signal x(k) for k = 
1,2, ... ,£ such that its absolute maximum value is unity. Equation (5) is identical to the 
LP model shown in Eq. (1) except that it is an open-loop formulation which will be explained 
later. 

Using the blind equalization method, the LP modeling problem is formulated as a con- 
strained optimization problem 

min ||©(/c + 1) — ©(A) ||| 
subject to | x(k) + X 7 (k — l)©(/c) | = e 

where e is a predefined positive constant value. The coefficient vector &(k) contains the 
coefficients of the equalizer, 

@(^ = {9^) e 2 (k)...e n (k)] T 

and X(/c — 1) is an observed sequence, 

X(/c — 1) = 1 [x(k — 1) x(k - 2) .... x(k — n)] T 

This constrained optimization problem was reformulated in [12, 13] to a Lagrange equivalent 
and the stochastic gradient descent strategy was adopted to solve for the LP coefficient 
vector. These algorithms need iterations between each consecutive sample to ensure that 
the constraints are satisfied. The prediction implemented to illustrate the performance was 
open-loop prediction. Although it is not a closed-loop prediction as required in speech 
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coding, the prediction algorithm may provide a new solution for input identification and 
quantization because of the predefined constant value of the equalizer output. Indeed, the 
definition of the equalizer output in Eq. (5) may be considered as the binary input of the 
LPC-type vocoders. 


2.3 Adaptive Multi-Modulus Blind Equalization 


To make the blind equalization algorithm applicable for speech coding (i.e. , closed-loop 
prediction), the conventional algorithm must be modified and advanced. A new approach 
is derived in this section. The undefined value e will be determined autonomously and 
adaptively rather than randomly picked. The input signal is not restricted to a predefined 
single constant value commonly used in the blind adaptive prediction (see Eq. (3)). 

Let the values of the output of the equalizer from Eq. (5) be the series sum of a binary 
stream multiplied by scalars, 

N 

u(k ) = -^2Mk)Si(k) ( 6 ) 

i=l 

where Si(k) £ {±1} is a binary stream and 4>i(k) is a weighting coefficient. The goal is to 
make the weighting coefficient 4>i(k) invariant with respect to k. Substituting Eq. (6) into 
Eq. (5) yields 

N 

x(k) + X T (k-l)@(k ) = -Y,Si(k)<Pi(k) 

i= 1 

= -A T {k)®(k) (7) 


where A (k) = [t)i(/e) ^(/c) ... <Sv(&)] 7 and $>(k) = [<fri{k) <p 2 {k) ... <pN(k) \ r . Equation (7) 
can be rearranged to become 


x(k ) 


Define two new quantities 


= -X T (/e - 1 )@(k) - A T (k)<f>{k) 


X T (k - 1) A T (k) 


@(k) 

<&(k) 


V(lfc — 1) 

*{k) 


X{k - 1) 
A(A) 
@(k) 
$(k) 


(8) 


(9) 
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Equation (8) becomes 

x(k) = -V T (k-l)V(k) (10) 

In practice for speech coding, the coefficient vector \l/( k ) should be constrained such that 

'!'(/,: } l).n. 4/(/, ) ^-4/ (11) 

for all k , i.e., independent of k. Imposing the constraint, Eq. (10) becomes 

x(k) = -V T {k - 1)* (12) 

The gain coefficient vector, <&(/e), is now integrated into the newly defined coefficient vector 
and can be seen as a deterministic gain vector of the input excitation u. 

Schroeder and Atal [3] used a Gaussian codebook to encode the input. After examining 
the first-order cumulative probability distribution function for the prediction residual, they 
found it resembled a corresponding Gaussian distribution function with the same mean and 
variance. In contrast, our prediction residual from the blind equalization is only a binary 
stream, A(/c). A binary codebook may be introduced to determine and encode the input. 
Binary coding is less restrictive than the conventional Gaussian codebook for generating 
the input, because it does not impose any assumption on the stochastic process. 

It is quite simple to generate the binary codebook. For example, a 2-bit codebook 
consists of 4 code vectors as follows 



and a 3-bit codebook has the form 

/ 1 1 1 1 -1 -1 -1 -1 \ 

11-1-11 1 -1 -1 I 

^ 1 -1 1 -1 1 -1 1 -1 ) 

An AT- bit codebook requires a collection of 2 N codes which can be used in Eq. (10) to 
estimate u(k). By selecting the minimum error, which results from the difference between 
the 2 n estimated values and the original signal, the optimum AT- bit input binary sequence 
can be determined. Its corresponding index, which is in a binary format using N- bit, can 
then be stored or transmitted. 
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To solve Eq. (12) for the constant vector \l/ is quite simple assuming that the optimum 
TV- bit input binary sequence A(/c) for all k is known a priori. Defining the quantities 

r iT 


X = 

y x(n + l) x(n + 2) ••• x(£) J 

V = 

[ V(n) V(n+1) ••• V(f-l) 


X(n) X(n + 1) ••• X(£-l) 
A(n+1) A(ra + 2) ••• A(£) 


(13) 


(14) 


Equation (12) thus produces 


X = -V T ^ 


(15) 


The quantity X is a (£—n) x 1 vector, \I/ is an (n+TV) x 1 vector, and V y is a (l—ri) x (n+TV) 
matrix. From Eq. (15), there exits a solution for if and only if the vector X is in the 
column space generated by the columns of V T . For the case where (£ — n) > (n + TV) 
(i.e. , more equations than unknowns), it is generally impossible to satisfy such a sufficient 
and necessary condition unless the signal to be synthesized is generated from a noise-free 
finite-dimensional linear system. The optimum solution is then the least-squares solution, 
i.e., 

§ = -(V r ) f X (16) 


where f means the pseudo-inverse and ^ implies the estimated quantity of \S I , that is, 


^ = 


®(k) 

§(&) 


(17) 


for any k. The least-squares solution minimizes the equation error between the real signal 
X and the estimated signal X, i.e., 


X = -V T § 


(18) 


Note that the quantity X is an open-loop estimation and thus not a synthsized speech 
signal. 

One problem remains to be solved. The optimum TV-bit input binary sequence 


A = 


A(n-fl) A(n + 2) 


A (£) 


needs to be determined. Given any sequence, say 


A.; = 


A.;(n + 1) A,(n + 2) 


Ai(£) 
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let us denote the least-squares prediction error to be 


Ern X - X, X + Vj’§. A 


( 19 ) 


where 

X,:^- v/ 'i', (20) 

and 

^ = -(vf)tx 

r X(n) X(n + 1) ••• X(£-l)l (21) 

A.;(n+1) A;(n + 2) ••• A ;(£) 

Among all possible choices of A; for i = 1, ... , ( 2 N ) l , one should pick the one that minimizes 
the norm of Err t . There are several ways of determining the optimum coefficient vector T* 
and the optimum iV-bit input binary sequence A from Eq. (16). The key idea is to choose 
the N-bit input binary sequence such that the columns of V T generate a column space to 
include the vector X as much as possible. 

Here we introduces a recursive least-squares technique that minimizes the cost function 
defined as 

k 

min || ^ A fc-i [z(i) + V T (i - 1)^(&)] 2 || (22) 

i= 1 

where 'f' is the smoothing coefficient vector and 0 < A < 1 is the forgetting factor weighting 
the data. The most recent data is given unit weight, but data that is n time steps old is 
weighted by A". The method is commonly called exponential forgetting. 

The recursive least-squares algorithm is summarized in the following. At the time index 
k, choose the IV-bit input A (k) among all possible binary combinations such that the 
estimation error is minimum, i.e., 

x(k) = — V T (/c - l)#(/c - 1) 

= - X T (/e - 1) A T (k) ] ^ ^ (23) 

L J 3>(/c — 1) 

e m in {k) = min || x(k) — x(k) || (24) 

A (k) 


Both the coefficient vector © of the blind predictor and the gain coefficient vector of the 
input can be updated recursively by 


G(k) 


P(k - l)V(k - 1) 

A + V T {k - 1)P [k - 1 )V(k - 1) 


(25) 


11 



( 26 ) 

(27) 


P (A) 
®(A) 


P(A-l) I - G(k)V T (k) 


A 

$(k - 1) +e min (k)G(k) 


where G (k) is the update gain determined by the matrix P(/c — 1), the vector X(/c — 1), and 
the scalar A. The initial values of P(0) and SP(0) can be arbitrarily assigned. Conventionally, 
P(0) and 0,(0) are assigned as dl n+ N and 0(„ + jy) Xl , respectively, where d is a large positive 
number, I n+ N is an identity matrix of dimension (n + N) x (n + N), and 0(„ + jv)xi is a 
zero matrix of dimension (n + N) x 1. The estimated coefficient vector, is the converged 
coefficient vector at k = l, that is, 

§ = #(0 (28) 

It is known from the recursive least-squares algorithm that the initialization introduces a 
bias into the parameter estimate If produced by the recursive least-squares method. For 
large data lengths, the exact value of the initialization constant is not important. It is noted 
that some accuracy may be lost when a least-squares problem is solved using the classical 
approach as described in this section. The reason is that the input and output data are 
squared to compute the data correlation. There is another method based on orthogonal 
transformation to avoid the computation of data correlation for the least-squares estimates. 
The method is commonly called a square root method [27, 28], because it works with the 
square root of the data correlation. 

In the conventional linear predictive coding, the coefficients © are computed alone using 
the open- loop formulation via the auto-correlation method or covariance method [1, 25] to 
minimize the prediction error. Other technology may use the cepstrum analysis to obtain the 
prediction coefficients, for example the homomorphic, but the computational complexity is 
a considerable problem in practice. After the coefficients © are obtained, the input can then 
be modeled either using vector quantization or any other methods to quantize the prediction 
error, but most methods involve a priori assumption about the type of the stochastic process. 
In contrast, our proposed method updates the equalizer coefficients © together with the gain 
coefficients $ of the input simultaneously. The application of constant modulus property 
to constrain the prediction error can be seen as a novel quantization methodology of the 
input. The input codebook can thus be derived directly from the analytical analysis without 
involving any assumption about the stochastic properties of the prediction error. 
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2.3.1 Open-loop prediction 


Given the estimated coefficient vector \1/ determined from equation (27), the open- loop 
prediction equation similar to equation (23) is 


x(k) 


V T {k - 1)^ 

r i 

© 

h-: 

i 

h— 1 

> 

1? 


L J 

$ 


(29) 


or equivalently 


n N 

x(k) Oix{k - i) ~^2$i5i(k) 


(30) 


i = 1 i = 1 

where 0i and fa are constant quantities. The open-loop prediction uses the observed signal 
to compute the signal prediction. The predicted value x(k) is computed using the past 
observed signal x(k — 1), . . . , x(k — n ) and the input u(k). The binary sequence A (k) = 
[5i(/c) 52(k) ... 5at(/c)] t of dimension N x 1 is chosen from the N-bit codebook of dimension 
N x 2 n with a total of 2 N different codevectors in the codebook. At each time k, only an 
index from the 2 N possible choices is stored. From hereon, Eq. (30) is used as a predictive 
model of the adaptive multi-modulus blind equalization for open-loop prediction. 

For the case where N = 1, Eq. (30) becomes 


x(k) + ^ 6jx(k — i) = faSfak) (31) 

i= 1 


The binary signal S\(k) is either 1 or —1 that is almost identical to the output of the 
equalizer shown in Eq. (3) with its coefficients computed by minimizing the cost function 
defined in equation (4). The main difference is that the coefficients 9i,. . . ,9 n together with 
fa in equation (31) are obtained by minimizing a global cost function rather than a local 
cost function at each time step k. 

There are a total of N constant gain coefficients fa,..., 4 >n for computing the output of 
the blind equalizer in Eq. (30), where | fa \ is the i-th modulus. The open-loop prediction, 
Eq. (30), may thus be called the adaptive multi-modulus blind predictor or more precisely 
the adaptive N -modulus blind predictor. In [15], the multiple modulus concept has been pro- 
posed to deal with blind equalization of signal, such as M-ary QAM (quadrature amplitude 
modulation). The proposed approach in this paper differs from the approaches in [15, 16] 
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by not presetting the modulus. Furthermore, our proposed approach is a combination of 
recursive least-squares and the Decision Directed (DD) [9, 10] equalizer. In [15] the first 
proposed algorithm, the MMA (multiple modulus algorithm), uses a straightforward gen- 
eralization of the cost function for the Constant Modulus Algorithm (CMA) [7, 8] to derive 
its update. The second one, DAMA (decision adjusted modulus algorithm), is a hybrid of 
the CMA and the DD approaches. 

2.3.2 Closed-loop prediction for speech coding 

In speech coding, the objective is to use the least number of bits in the digital representation 
of the speech signal, that is, to compress the signal x(l),x(2 ), . . . , x(d) where t is the length 
of the signal. The open-loop prediction cannot be used for such purpose. It can only be 
done by the closed-loop prediction equation, i.e. , 

n N 

x(k ) = — y] 9jx(k — i) — y~] <t>jUj(k)-, k = 1,...,£ (32) 

i= 1 i—1 

where the initial signal x(0), x(— 1), . . . , x(— n + 1) are set to zero. For a better prediction, 
one may shift the starting prediction point from A = 1 to A -s n + 1 and set the first n 
points to be the actual signal. However, it will increase few bits for the additional n initial 
points to be stored and transmitted for speech coding. 

From equation (32), it is clear that the speech signal can be represented by the n 
equalizer coefficients 9 \, . . . , 0 n , the N gain coefficients (pi, ... , <pN, and the t AT- bit binary 
input input sequences. Assuming that each parameter may be accurately quantized by an 
M- bit binary number. As a result, a total of M(n + N) + Ni bits plus the codebook will 
be able to represent the speech signal of length t. If the sampling rate is 8 KHz, a resulting 
bit rate is [M(n + N) + 7W]/8000 kb/s. 

2.4 Variable frame of input signal 

To reduce the amount of data that must be transmitted, the conventional coders usually 
deal with the speech by frames (blocks) of samples. However, this may not be the best 
way to describe the non-stationary behavior of the sound sources. On the other hand, 
the coding performed sample- by-sample always results in a larger number of data to be 
transmitted. But these precise coding based on sample- by-sample can always produce high 
quality with negligible coding distortion and delay. The blind adaptive prediction was 
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Figure 3: Original speech signal ’while new’ 


originally proposed to carry out the prediction on the sample- by-sample basis to overcome 
the non-stationary problems. 

Here we introduce a variable frame concept for transmission and storage of input signal. 
The sample- by- sample scheme is used to do precise coding and obtain the high quality. 
Then, the input (codevector) sequence A (k) for k m ‘1, . . . ,£ is blocked into frames to be 
transmitted or stored. Each frame contains an identical index from the binary codebook. 
The length of each frame is varying. Only the beginning point and the length of the frame 
need to be stored and transmitted. Hence, the resulting number of bits to represent the 
speech signal can be reduced. It may greatly reduce the bit rate for transmitting or the 
space for storing the input signal. 

3 SIMULATION 

The simulation is performed using a 10-th order equalizer for a speech signal shown in 
figure 3 where the total sample number is 6650. Several binary codebooks of different bit 
numbers are generated for searching the optimum input sequence. We set the forgetting 
factor A = 0.999 and the initial value of P(0) = 1000In. First, the distribution of the 
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Figure 4: Mean squared error via different bit number codebooks 


reconstructed error with various bit codebooks is examined. Figure 4 shows the plot of the 
mean squared error between the synthesized signal x(k) and its original signal x(k), i.e. , 
Y%=: i( x (k) ~ x(&)) 2 /6650. When the bit number increases, there is a significant decrease 
in the mean squared error. However, it also leads to a considerable computational cost 
because of the search in an expanded codebook. As a 10-bit Gaussian codebook is adopted 
in CELP [3], we also focus on the bit number less than 10. This is due to the fact that 
for those codebooks with bit number more than 10 (where a 10-bit codebook consists of 
2 10 = 1024 codevectors), the number of codevectors increases exponentially with the number 
of bits resulting in a high computational cost while searching for the optimum codevector. 
Let us now examine the results using a 4-bit codebook and a 8-bit codebook. For the 4-bit 
codebook, the gain vector for the input, <&(£), is plotted in figure 5, and the coefficients 
of the equalizer is shown in figure 6. The synthesized speech generated by the closed- 
loop prediction can be found in figure 7. Figure 8 is the error between the original signal 
and the synthesized one, simply by subtraction. Since the recursive algorithm involves 
the coefficient smoothing, all parameters converge to a constant value. The gain vector 
approaches a constant value after about 2000 samples. These well-behavior gains indicate 
that the number of bits employed to do coding is sufficient. The synthesized speech signal 
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Figure 5: The gain vector for 4-bit coding 


turns out to retain a good quality as expected. While listening to the sound, despite of the 
existing error, listeners can still hear it clearly. 

The results obtained from the 8-bit codebook are shown in figures 9, 10, 11, and 12. 
The adapted gain vector is shown in figure 9, and the coefficients of the equalizer can 
be found in figure 10. The synthesized speech signal is in figure 11. In figure 12, we 
show the error between the original speech and synthesized one. The primary means of 
measuring performance is through subjective testing by listeners. Consistent with the results 
in figures 8 and 12, the quality of 8-bit coding is better than that of 4-bit coding. It is hard 
to distinguish audible differences the original and synthesized signal using 8-bit coding. It 
means that the quality of synthesized speech is reliable. 

Revisiting the result in figure 4 for the bit number less than 10, the mean squared 
errors are, in fact, slightly different except those from 1-bit and 2-bit. Let us compare 
and discuss these cases in a detail manner. From the geometric theory as discussed in the 
previous section, the vector X must be representable in terms of the space generated by 
the columns of V T to obtain an optimal solution for T' that minimizes the error between 
the original speech and its synthesized one. The increase of N-bit representation is meant 
to expand the dimension of the space generated by X. Theoretically, the larger the N-bit 
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Figure 12: The prediction error for 8-bit coding 


representation is the better the optimal solution should be. However, the computational 
cost is also increased. An optimal N-bit representation depends on the desired quality of 
the synthesized signal, computational cost, and the data transmission rate. Let us compare 
and discuss the gain vector for the input signal for several different cases. Recall the gain 
vectors $ shown in figures 5 and 9, and plot the gain vector results from a 1-bit codebook, 
a 2-bit codebook, and a 6-bit codebook in figures 13, 14, and 15, respectively. Clearly the 
1-bit or 2-bit representation is not sufficient to produce a converged solution for the gain 
vector, that is, the space generated by V is not enough and the solution is poor. For a 
4-bit representation, the solution improves considerably. Cases with 6-bit, 8-bit, or even 
10-bit, show that the results are approximately the same but the extra bits enhance the 
quality of synthesis. Beyond 10-bit, the quality enhances slightly, but the computational 
cost increases considerably for just a bit added. 

In the following, we introduce an improved scheme to advance the limited quality without 
significantly increasing the cost of computation. As used in the differential pulse code 
modulation (DPCM) [26], speech quality in LPC can be improved at the expense of a higher 
bit rate by computing and transmitting a residual error. We employ a low-bit codebook to 
encode both the signal and its residual. A 4-bit codebook as discussed earlier is sufficient 
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Figure 17: The prediction error for 8- bit coding by improved scheme 

to obtain a good solution for the gain vector of input signal. As shown in figure 8, the 
error is encoded again by the same 4-bit binary codebook. Figure 17 shows that the error 
of the synthesis is reduced when compared to that in figure 12. Moreover, when listening 
to them, the quality of the improved scheme is better than just encoding the signal by a 
8-bit codebook. 

In this improved scheme, we repeat the encoding process of the prediction residual 
and thus the computational cost of encoding is doubled. Since the proposed algorithm is 
computationally efficient, this cost increase is still acceptable. Because we split a 2N-bit 
coding to two N-bit coding, the codebook for optimal search is reduced by 2 N in dimension. 
The computational cost for tow N-bit coding is still much cheaper than the 2N-bit coding. 
Moreover, it was found that beyond certain number of bits in the codebook, the quality of 
synthesized signal does not improve. As the results shown in figure 5, the gain of the input 
is as stable as that in figure 9, implying that the space spanned by X with 4-bit coding 
is sufficient enough to produce a good solution of Stf By applying the multi-step coding, 
the quality by two 4-bit coding is considerably improved than that by one 8-bit coding. 
This improved scheme not only enhances the quality of speech as the same 8 bits stored 
or transmitted, but also decrease the computational cost during the search for an optimum 
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Figure 18: Codebooks for searching an optimum input sequence (a) improved scheme (b) 
original shceme 


8-bit input sequence. Figure 18 shows the sizes of 4-bit and 8-bit codebooks. 

4 CONCLUSIONS 

In this report, a new method is developed and implemented for speech coding and synthesis. 
The conventional linear predictive coding requires two computational steps, i.e. , coefficient 
estimation of an all-pole model, and quantization of the prediction residual. The model 
coefficients are estimated by minimizing the mean squared error. The prediction residual 
is quantized to be used as the input signal during the process of speech coding and signal 
synthesis. On the other hand, the blind adaptive predictive coding estimates the coefficients 
of the blind equalizer with the assumption that the output of the equalizer (i.e., the input 
for speech coding) is a priori fixed at uncertain values. It generally produces a poor estimate 
of the coefficients and a poor quality of the signal synthesis. 

In contrast, the proposed method uses the deterministic approach to simultaneously 
estimate the combined coefficients of the blind equalizer and the binary input excitation. 
The linear geometric theory is used to establish the theoretical background of the proposed 
method. The combined coefficients are estimated by minimizing the angle between a vec- 
tor in the direction of the signal and the space generated by the shifted signal and the 
binary input. A recursive algorithm is introduced for computational ease and real-time 
implementation. Simulations have shown that the quality of the synthesized signal can be 
significantly improved from 1-bit coding to 4-bit coding. Beyond 4-bit coding, the qual- 
ity enhances slightly but the computational cost increases considerably. To overcome this 
problem, the encoding process is repeated on the prediction residual using the same 4-bit 
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codebook. With much less computational cost, the repetitive process produces the quality 
of the synthesized signal with the 4-bit codebook better than that with the 8-bit coding. 

The proposed technique provides a totally different framework for voice coders. The 
concept based on the linear geometric theory gives a new direction to explore more funda- 
mental works and applications. 
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